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Abstract — In this paper we review the randomized algorithm 
based design philosophy for probabilistic robust controller 
synthesis and the role of empirical process theory on de- 
signing such algorithms. Among the existing polynomial time 
randomized algorithms some have over conservative sample 
complexity bounds and consequently there lies considerable 
scope to improve those bounds. 

I. INTRODUCTION 

During recent years it has been reported that several prob- 
lem in robustness analysis and synthesis are either NP com- 
plete or NP hard. A survey of NP hard problems in control 
theory can be found in the work of Blondel and Tsitkilis [1] 
and [2], Bernstein [3], Nemirovskii [4], Poljak and Rohn [5]. 
These results show that various problems in robust control 
are practically unsolvable if the number of variables become 
sufficiently large. The problem of checking robust stability 
and performance under structured / parametric uncertainty is 
proven to be NP-hard. However, parametric uncertainty can 
also be handled in the H^ framework, but then the design 
will be too conservative. Although, /i analysis / synthesis 
approach although can handle parametric uncertainty, but 
computational complexity of ji calculation is again NP-hard. 
Also, stability evaluation of interval matrices is itself a NP- 
hard problem. For details of computational complexity of 
algorithms see [6]. 

Recent research on computational complexity of robust 
control analysis and design problems indicates that these 
difficulties are most likely, inherent to the problem formu- 
lations rather than a lack of ingenuity. To overcome the 
intractability issues and conservatism, recent trends are to 
consider the problem of stability and performance robustness 
in a probabilistic framework. One can refer to [7] for details 
of these approaches. Among others, the statistical learning 
theory based algorithms, originally initiated by Vidyasagar 
[8] is of considerable interest. In this framework one designs 
a controller based on average performance requirement as 
the plant varies over a pre-specified family and is not based 
on worst case performance requirements. For such cases, 
the controller synthesis problem can be formulated as the 
minimization of expected value of the objective function. 
Statistical learning theory shows that whenever a property 
known as uniform convergence of empirical means holds, 
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there exists an efficient randomized algorithm for an asso- 
ciated function minimization problem. Therefore, one can 
derive sample size estimates to approximate a minimum 
of a function with arbitrary accuracy and confidence. The 
approach can tackle a wide range of cost functions and is 
not necessarily confined to only convex functions. Some 
specific problems that fall within this framework include 
robust stabilization of a family of plants and minimization 
of weighted Hz/H^ norms. 

In this paper we present an overview of the application of 
statistical learning theory in a robust control framework. The 
paper is organized as follows: in Section II some intractable 
problems that arises in control theory are reviewed. Section 
III contains the probabilistic robust synthesis problem for- 
mulation. In Section IV different notions of approximation 
of function minima is given. In Section IV we investigate 
the role of empirical process theory in designing robust 
controllers. Sections VI and VII contain algorithm based on 
different notions of near minimum or maximum. Finally, in 
subsequent section we conclude and discuss the scope of 
research. 

II. NP Hard Problems in Controller Synthesis 

During the past decade researchers studied the com- 
putational complexity of controller analysis and synthesis 
problems. Surprisingly these studies showed that a number 
of analysis and synthesis problems are NP-hard problems. 
Checking robust stability, robust positive semi-definiteness, 
robust norm boundedness, robust non-singularity; all these 
problems were proved as NP Hard. These are all analysis 
problems and intuitively if the analysis problem is NP hard 
then the corresponding synthesis problem is at least as 
difficult. In the following we give some of the NP hard 
synthesis problems. 

A. Robust Stabilization Against Structured Perturbation 

Consider the problem of finding a fixed controller K 
which stabilizes an uncertain plant P, subjected to structured 
perturbations around Pq. The well known solution to this 
problem is to validate if an associated structured singular 
value is less than one. It is shown in [1] that answering this 
decision problem is NP hard. It is also shown in [2] that 
even to find an approximate solution is also NP Hard. 

B. Constant Output Feedback Stabilization with Constraints 

For a given state space system matrices A,B,C it is NP 
hard problem to decide whether there exists a controller 
matrix K whose elements are bounded within an interval, 



stabilizes the closed loop system or not [1]. One can try 
to solve the problem using Tarski-Seidenberg elimination 
theory but the number of polynomial inequalities becomes 
exponential with the dimension of the system. 

C. Simultaneous Stabilization using Constant Output Feed- 
back 

For a family of plants (a family of A,B,C matrices) it is NP 
hard to problem to decide whether there exists a controller 
matrix K whose elements are bounded within an interval, 
stabilizes the closed loop system or not [1]. 

These results leads to a realization that such innocent 
looking robust synthesis problems are in fact intractable. To 
tackle the situation randomized algorithm proves viable. The 
empirical process theory(or its widespread name statistical 
learning theory) gives us justification to use such algorithms. 

III. Probabilistic Robust Synthesis Problem 

Let there exists a family of plants defined as {G(x),x EX} 
parameterized by x. Also assume that there exists a controller 
family {K(y),y e Y} parameterized by y. Now, suppose P is 
a probability measure on the set X. Let, e G K be a given 
accuracy parameter. 

In the probabilistic framework, instead of looking for 
a K(y) which satisfies worst case performance for all the 
plant instances, one compromises the search for a K(y) that 
satisfies the performance index for most of the plant instances 
except possibly for those belonging to a set of measure no 
larger than e. The performance function to be minimized can 
be defined as 



J:=E[ V (G(x),K(y))} 



(1) 



where, E(-) denotes the expected value. The expectation 
measure with respect to P of cost functional captures the 
intuitive idea that a controller is allowed to perform poorly 
for some instances. Notice that the cost function becomes a 
function of controller parameter once the underlying proba- 
bility distribution P of plant family is chosen, i.e. 



f(y):=E P [ V (G(x),K(y))} 



(2) 



Let us define g y (x) := H/(G(x),K(yj). Also, without loss of 
generality for each y s Y, g y (-) maps X into [0, 1]. Let, F := 
{g y {-),y € Y} is the associated family of controllers. The 
objective is to find y = y* that minimizes f(y). 
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IV. Notions of Near minima 

In the previous section, an abstract formulation of robust 
controller synthesis problem was given, which ultimately 
lead to the problem of finding the minimum of some function 
/ : Y — > [0,1]. However, since the problem of finding the 
absolute minimum is NP hard one needs to find a near 
minimum of /(•). In the following, we describe different 
notions of minima as mentioned in [9] . 



A. True Minimum 

Definition: Suppose / : Y ■ 
/(•) is defined as, 



Then the true minimum of 



f := min/GO 

yeY 



(4) 



B. Approximate Near Minimum 

Definition: Suppose / : Y — > E and that e > is a given 
number. A number /q € R is said to be an approximate near 
minimum of /(•) to the accuracy of e if, 



inf/(y)-e</ <inf/(y)+£ 

yeY yeY 



(5) 



This is the most intuitive notion of near minima. However, 
as proved in [16], it is even NP hard to find a near 
approximation of /*. This means with any given number 
£ it is NP hard to decide whether I/q — inf f(y)\ < £ or not. 

yeY 

This necessitates to look for other notions of near minima. 

C. Probable Near Minimum 

Definition: Suppose / : Y — » E, P is a given probability 
measure on the set X and a > is a given number. A number 
/o is said to be a probable near minimum of /(■) to level a if 
/o > /* and in addition if P{y e Y : f(y) < /o} < a. Since, it 
is NP hard to decide exactly whether a given number /o is £ 
distance away from the true minimum or not, therefore most 
intuitive choice is to assign a certain degree of uncertainty 
to /o to be near minimum. The interpretation is, there could 
be an exceptional set S whose measure is at most a such 
that 



inf /GO < /o < inf f(y) 

yeY y eY\S 



(6) 



D. Probable Approximate Near Minimum 

Definition: Suppose / : Y — ► R, that Py is a given prob- 
ability measure on Y and that £,a > are given numbers. 
A number /o € 1 is said to be a near minimum of /(.) to 
accuracy £ and level a, or a probably approximate minimum 
of /(.) to accuracy £ and level a if /o > /* — £ and in 
addition P Y {y € Y : f(y) < f - e} < a. 

Probable near minimum and probable approximate near 
minimum are compromises from evaluating true near min- 
imum or £-accurate true near minimum. But before evalu- 
ating these one needs to calculate the expectation of cost 
functional for a fixed controller parameter vector and plant 
probability distribution P. This requires computation of mul- 
tidimensional volume integral which is almost impossible to 
compute except in some trivial cases. Also, any determin- 
istic numerical computation technique for calculating such 
integrals has exponential complexity with the number of 
dimension. Therefore to calculate expectation one has to rely 
on randomized algorithms which have strong mathematical 
support arising from empirical process theory. 

Different notions of approximate near minimum leads 
to different randomized algorithms. In the following we 
describe briefly different algorithms, details of which can 
be found in [9] and [7]. 



V. Role of Empirical Process in Approximate 
Minima Calculation 

Empirical process theory tells us that we can approximate 
a quantity with arbitrarily small error from empirical experi- 
ments/observations. To understand how this can help in a way 
to obtain an unbiased estimate of true minimum and conse- 
quently to find lower bounds on sample complexities, let us 
consider the case of a binary classification problem where 
some training examples (x\,yi),...(x m ,y m ) were given and 
one needs to construct the relationship between the training 
samples (predicting the underlying function h(x) =y) and to 
predict the outcome for a new sample x. One can define an 
empirical risk function £,- := j\h(xi) —yt\ which is either 
or 1 (provided we have a ±1 valued function h). All training 
samples are drawn independently, so we are faced with 
independent Bernoulli trials. The £i,...£ m are i.i.d random 
variable. A famous inequality due to Chernoff characterizes 

how the empirical mean of loss function defined as — £ C; 

i=\ 

converges to the expected value of loss function £, denoted 
as E(Q for a particular h, 



P{|-££-£(C)| > £} < 2exp(-2me 2 ) 

Ml ™~ 



(7) 



This indicates that the convergence in probability to the true 
mean is exponentially fast as the number of observations 
increases. This is also known as law of large numbers. A 
similar inequality called Hoeffding inequality also exists and 
is stated as follows. 

Theorem 1: Hoeffding [11] Let £,-,/€ [m] be m indepen- 
dent instances of a bounded random variable £, with values 

m 

in [a,b]. Let their average be Q m = - 
£>0, 



Y. &. Then for any 

=i 



P{\Qr, 



,-£(C)|> £ }<exp(^ 



(8) 



From the Chernoff bound one can find how many obser- 
vations are required to estimate the unknown quantity E(Q 
to an accuracy of £ and confidence of 1 — 5, which is 
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m > — » In — 
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(9) 



Ideally, we want to compute f(y) for a given y so that /(•) 
can be minimized using any suitable method, but except in 
some trivial situations it is rather difficult to compute f(y) 
exactly. Instead, we can approximate f(y) empirically with 
an arbitrary small error. One can have a collection of m i.i.d 
samples in X, generated according to (11) and can define 
for each function g y s Y, the empirical mean based on a 
multi-sample x as 



Ml ^^ J J 



(10) 



J=l 



In other words, the actual performance f(y) of a controller 
K{y) is approximated by its average performance on the 
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Fig. 1. The convergence of empirical minima to actual minima. The 
horizontal axis gives a one dimensional representation of the function class; 
the vertical axis denotes the minima. For each fixed function /, the law of 
large numbers tells us that as the sample size goes to infinity, the empirical 
minima M emp \f] converges toward the true minima M[f] (indicated by the 
downward arrow). This does not imply, however, that in the limit of infinite 
sample sizes, the minimizer of the empirical minima,/'" , will lead to a value 
that is as good as the best attainable minima, M[/ 0, "](consistency). For the 
latter to be true, we require convergence of M emp [f] towards M[f] to be 
uniform over all functions. 



randomly generated m plants. One can further try to calculate 
inf/fy) but there exists two crucial caveat. 

yeY 

First, the important property of Chernoff bound (or more 
general Hoeffding bound) is that they are probabilistic in 
nature. They do not rule out the possibility of existence of 
some functions for which the error convergence inequality 
does not hold. To tackle the situation for such erratically 
behaving function we need to have a more consistent law of 
large numbers. This means that to determine the consistency 
of empirical risk minimization one need to consider the worst 
case over all functions, see fig 1. In other words, the law of 
large numbers has to be uniform over all such functions. 
This insight is brought by Vapnik and Chervonenkis in their 
seminal work [12] and the following theorem describes the 
same. 

Theorem 2: Vapnik-Chervonenkis One sided uniform 
convergence in probability 



limP{sup|/»-/(y)|>£}=0 



(11) 



?ver 



for all £ > 0, is a necessary and sufficient condition for 
non trivial consistency of empirical risk minimization. The 
condition of uniform convergence depends on the set of 
functions for which it must hold. 

Secondly, one cannot possibly try to evaluate the empirical 
expectation for each possible f(y) from the infinite function 
class r. 

VI. Randomized Algorithm for Finding Probably 

Approximate Near Minima 



Now for the first problem, let us define 

q{m,e,T):=P m {xeX m : sup \E(g y ;x) -E 
g y er 



>£} (12) 



If the family of functions show the property of q(m, £; F) — > 
asfmoo for each £ > then the family has the uniform con- 
vergence of empirical mean (UCEM) property. Depending 
on whether the function class possess the UCEM property, 



different randomized algorithms to approximate probable 
near minima can be obtained. Assume that the function class 
G posses uniform convergence of empirical means property. 
If such is the case then one can choose m large enough 
according to (11) such that it can be said with confidence 
1 - 5 that 



\f(y)-E(g y ;x)\<e,VyeY 



(13) 



In other words, the function E(g;x) is a uniformly close 
approximation to the original objective function /(•). Hence 
it really follows that an exact minimizer of E(g;x) is also an 
approximate near minimizer of /(•) to accuracy e. 

For the second problem, since there might not be a closed- 
form expression for f(y), therefore efficient gradient based 
minimization methods turns out to be impossible to use. 
One obvious way out is to limit the infinite function class 
into a finite one by selecting n number of i.i.d samples 
of controller g y form Y according to a distribution Q and 
therefore calculating the empirical minimum of the expected 
value of f(y) over all g y s F. Since, we are not assuming the 
consistency of the function class F therefore this empirical 
minimum could be far away from the true minimum of f(y). 
Specifically, let g* denote the controller from the function 
class r that gives the lowest value f(y). Define c* := f(y). 
Also, define M{c*) := {g y e F : f(y) <c*} i.e. M{c*) is the 
set of functions for which f(y) < c*. Then given a positive 
constant a, it can be said with confidence (1 — a) n that 
the measure Q(M(c*) < a. In other words, it can be said 
with confidence (1 — a) n that the controller g* minimizes 
the performance function over nearly all of F. 

Notice that the calculation of confidence interval of the 
approximation of true minima involves double randomization 
of both the controller and plant family. Therefore, let us 
partition the confidence interval of 1 — 8 into two halves 
and estimate the corresponding sample complexities. 

Now, according to Hoeffding inequality, 

P m {xeX m : \E{f;x)-E P {f)\ > e} < 2exp(-2me 2 ) (14) 

where, P' n is the m fold product of the probability measure. 
Now let us find out the union bound for those set of samples 
in X for which the true minimum and empirical minimum 
could differ by more than e for all n number of functions. 
Therefore, let 



C i E :={xeX m :\E(f i ;x)-E P (f i )\>e} 



(15) 



denote the set of samples for which the true minimum differs 
from the actual one for a particular /,. Then 



p(c) = p(ciu...uq)<£p(c< 



(16) 



Then, from the Hoeffding inequality 

P(C) <2«exp(-2me 2 ) (17) 

Therefore, from the following inequalities 

(1-a)" < 8/2 (18) 

2«exp(-2me 2 ) < 8/2 (19) 



one can derive the sample complexity as, 

ln(2/g) 
ln[l/(l-5)] 



n > 



1 An 

m * 2^ ln T 



(20) 



(21) 



Next we describe the following algorithm based on above 
arguments. 



andm>^lnf 



Algorithm 1: 

Choose integers n > in \i /((_§)} 

Generate n i.i.d samples x\,X2, ••••% from the set X according 

to (P) and m i.i.d samples y\,y2,...y m from Y according to 

Pk 

For j <— 1 to M do 

calculate f, = ± £ y(*(ty),G(*<)) 
Return The empirical optimum controller 
K(yj) = argmin f t 

I 

/o := min/j 

i 

Then, with confidence 1 — 5, it can be said that /o is a 
probably approximate near minimum of /(.) to accuracy e 
and level a. 

Note that in this algorithm the associated function class 
r is not assumed to have any property or structure. It is 
rather a general algorithm. The caveat is that the empirical 
minimization of cost functional heavily depends on the 
sampled controller sequence. Also, as the level parameter 
a approaches zero, both the n and m parameter has to be 
increased simultaneously. The performance of the controller 
depends on the samples directly. 

Now consider the case where the function class F has 
the consistency property. We need to find a bound of the 

following probability P{sup(/(y) — f(yj) > e}. Now again 

sver 
consider the case of binary classification. Each function of 
the class separates the training samples in a certain way 
and thus induces a certain labelling of the samples. Since 
the labels are in {±1}, there are at most 2 m different 
labelling for m samples. A very rich function class might 
be able to realize all 2 m separations, in which case it is 
said to shatter the m samples. However the given class 
of functions might not be sufficiently rich to shatter the 
m points. The Vapnik- Chervonenkis dimension or VC- 
dimension is defined as the largest m such that there exists 
a set of m points which the class can shatter. A necessary 
and sufficient condition of a collection of sets {g y e Y} to 
have the uniform convergence of empirical mean property 
under any probability distribution P, is that the corresponding 
set's VC dimension should be finite. This means that no 
matter how we choose the m controllers, the empirical mean 
will converges towards true mean when number of samples 
tends to infinity. The convergence property now does not 
depend on the sampled controller sequences. Based on this 
we can construct another randomized algorithm. We begin by 
partitioning the confidence 8 into two parts as in previous 
case. Since the collection of sets {g y s F} has the uniform 



convergence of empirical mean property therefore it has a 
finite VC dimension d. Then as proved in [9] if we draw at 

least 



,16, 8 32rf 32c, 
n > max] — - In — , — — In — =- [ 
e 2 8 e 2 e 2 



(22) 



i.i.d samples of plants then we can say with confidence 1 — 
8/2 that each empirical estimate f(y) is within e of the 
corresponding true value f*(y). Next let us choose an integer 
m large enough that (1 — a) m < 8/2, or 

ln(2/5) 



m > 



(23) 



ln(l/(l-a)) 

Choose i.i.d samples g y[ ,...g ym distributed according to 
any Q and pick g y . that gives the minima for f(y). This 
empirical minimum, is within confidence 1 — 5/2, a near 
minimum of f(y). Combining both statements shows that 
this procedure gives a near minimum of f(y) to accuracy 
e, confidence 1 — 8, and to a level a. In the following we 
present the algorithm based on above arguments. 



ln(2/g) 
ln[l/(l-5)] 



and 



n > 



Algorithm 2: 

Choose integers m > 
max{ifln|,f^ln^} 
Generate n i.i.d samples x\,X2, ...x n from set X according to 
(P) and m i.i.d samples y\,y2,...y m from Y according to P^ 
For j <— 1 to M do 

, M 

calculate fi= M £ ^(^(^),G(x,-)) 

i=i 
Return The empirical optimum controller 

K(yj) = argmmfj 
j 
There can be two possible source of conservatism in the 

estimates of sample complexity, either in the estimate of 

VC dimension d or the estimate of sample complexity for 

the VC dimension d. Therefore, there is considerable room 

for improvement in the estimate as shown in [13]. 

VII. Randomized Algorithm for Finding 
Probable Near Maxima 

Efficient algorithms based on the concept of probable 
near minimum is introduced by Tempo et al, [15] and 
Khargonekar and Tikku [14]. Here we briefly describe their 
approach. 

Define the distribution function of the random variable / 
as P. Now, for each a e M let, 



r(a):=P{yeX:f(y)<a} 



also, 



max f{ yi ) 

\<i<m 



Given e > 

:1-j 

P{yeY:f(y)<f(y)} = r[f(y)}>l 



(24) 



(25) 



e > define a E := inf{a : r(a) > 1 — e}. This means 
r(a) < 1 — £ if a < a £ . Now suppose f(y) > a £ then 

(26) 



or, 



P{yeY: f(y) > f(y)} = 1 - r[f(y)] < £ (27) 



Therefore, if P{y e Y : f(y) > f(y)} = 1 - r[f(y)] > E then 
f(y) < a E and r[/(y)] < 1 — £. Since, f(y) < a £ , therefore 
f(yi) < a £ . Taking the mth fold probability is simply 



P m {yeY:P{f(y)>f(y)}>e}<(l-e) 



(28) 



Now choosing an integer m such that (1 — e) m < 8 leads to 
a lower bound of m as 



m > 



ln(l/5) 

ln[l/(l-e)] 



(29) 



Therefore if this is the lower bound of the sample complexity 
then it can be said with confidence at least 1 — 8 that f(y) 
is a probable near minimum of /(•) to level a. Notice that 
the bound in sample complexity is independent of number of 
uncertain parameters and the density function of f(y). This is 
best possible estimate for m. Here we present the following 
algorithm based on the above argument. 



ln(l/g) 



Algorithm 3: 

Choose integers m > ln [ 1/(1 _ e) ] 

Generate m i.i.d samples y\,y2,...y m from Y according to P 

For j *—\ to m do 

calculate f t = min \jr{K{y,),G{xi)) 

\<i<m 

Return The empirical optimum controller 
K(y t ) = arg/i 

i 

Notice that in this algorithm only controller randomization is 
used. In this sense the algorithm computes probabilistically 
worst case performance bounds instead of an average case 
bound. As one can expect this reduces the sample complexity 
greatly as reported in [15]. 

VIII. CONCLUSIONS AND FUTURE WORK 

In this paper we reviewed several existing empirical pro- 
cess theory based randomized algorithm for probabilistic 
robust controller synthesis problem. The sample complexity 
of algorithm 1 and 3 are found best possible, while there 
exists considerable scope to improve sample complexity 
bounds for algorithm 2. 
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